Acoustic Modeling of Accented English Speech for Large-vocabulary Speech Recognition
نویسندگان
چکیده
In this paper, we present a study on robust speech recognition with respect to accent variations. Differences that characterize accents in speech can be divided into two parts: phonetic and acoustic. We focus on the acoustic differences and the ways of acoustic model design and training that can be used to minimize the effect of accent variations on the speech recognition system’s performance. When accented training data is available, a typical approach is to train an acoustic model for each accent and use them in parallel. Another way is to pool all data together and train one model with more parameters assuming that accent variations can be learned by the training algorithm. We compared both of these approaches with a method based on the hybrid HMM/Bayesian Network (HMM/BN) framework using a database consisting of speech from the three major accents of English: American, British and Australian. The results of our experiments show that in the matched accent case, the accent dependent acoustic models perform the best. However, if the accent is unknown, for models with a small number of parameters, the pooled data training approach is preferable. In contrast, when the amount of data allows for training models with a relatively large parameter number, the HMM/BN model is the best choice.
منابع مشابه
English Alphabet Recognition Based on Chinese Acoustic Modeling
How to effectively recognize English letters spoken by Chinese people is our major concern in the paper. Some efforts are made to build Chinese extended Initial/Final (XIF) based HMMs for English alphabet recognition which can be integrated with large vocabulary continuous Chinese speech recognition (Chinese LVCSR) system based on a same XIF set. The alphabet-specific XIF HMMs are built using c...
متن کاملLexical and Acoustic Adaptation for Multiple Non-Native English Accents
This work investigates the impact of non-native English accents on the performance of an large vocabulary continuous speech recognition (LVCSR) system. Based on the GlobalPhone corpus [1], a speech corpus was collected consisting of English sentences read by native speakers of Bulgarian, Chinese, German and Indian languages. To accommodate for non-native pronunciations, two directions are follo...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملAutomatic Recognition of Cantonese-English Code-Mixing Speech
Code-mixing is a common phenomenon in bilingual societies. It refers to the intra-sentential switching of two different languages in a spoken utterance. This paper presents the first study on automatic recognition of Cantonese-English code-mixing speech, which is common in Hong Kong. This study starts with the design and compilation of code-mixing speech and text corpora. The problems of acoust...
متن کاملAcoustic modelling of English-accented and Afrikaans-accented South African English
In this paper we investigate whether it is possible to combine speech data from two South African accents of English in order to improve speech recognition in any one accent. Our investigation is based on Afrikaans-accented English and South African English speech data. We compare three acoustic modelling approaches: separate accent-specific models, accentindependent models obtained by straight...
متن کامل